Column DB faster than row
Row oriented | Coloumn oriented |
---|---|
Consider employee data is stored
|
Same data is stored as:
|
Difference Row vs Column DB
Row Oriented | Column Oriented (Best for Analytical Queries) | |
---|---|---|
SQL | yes | yes |
What | data is stored row by row. Eg: MySQL, Postgres | data is stored column by column. Eg: Amazon Redshift, SAP HANA, Google PowerDrill |
Use Case |
OLAP needs complex queries that
require the processing of large volumes of data to generate insights. Example Finding Average(SELECT AVG(Salary) FROM Employees) Aggregating Queries (SELECT AVG(Salary), MIN(Age), MAX(Age) FROM Employees;) Filtering with Aggregation (SELECT Department, COUNT(*) FROM Employees WHERE Age > 30 GROUP BY Department;) |
OLAP Scenario
What is OLAP(Online Analytical Processing)
-
OLAP scenarios require real-time responses on top of large datasets for complex analytical queries with the following characteristics:
1. Datasets would be massive - billions or trillions of rows
2. Only a few columns are selected to answer any particular query
3. Results must be returned in milliseconds or seconds
4. OLAP scenario is very different from other popular scenarios (such as OLTP or Key-Value access).
5. MongoDB or Redis for analytics will get very poor performance compared to OLAP databases
Advatanges on IO
-
1. For an analytical query, only a small number of table columns need to be read.
For example, if you need 5 columns out of 100, you can expect a 20-fold reduction in I/O
- 2. Data in columns is also easier to compress.
- 3. Due to the reduced I/O, more data fits in the system cache.